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Abstract. Knowing how the Human brain is anatomically and function¬ 
ally organized at the level of a group of healthy individuals or patients 
is the primary goal of neuroimaging research. Yet computing an average 
of brain imaging data defined over a voxel grid or a triangulation re¬ 
mains a challenge. Data are large, the geometry of the brain is complex 
and the between subjects variability leads to spatially or temporally non¬ 
overlapping effects of interest. To address the problem of variability, data 
are commonly smoothed before performing a linear group averaging. In 
this work we build on ideas originally introduced by Kantorovich ^8] to 
propose a new algorithm that can average efficiently non-normalized data 
defined over arbitrary discrete domains using transportation metrics. We 
show how Kantorovich means can be linked to Wasserstein barycenters in 
order to take advantage of the entropic smoothing approach used by [7] . 
It leads to a smooth convex optimization problem and an algorithm with 
strong convergence guarantees. We illustrate the versatility of this tool 
and its empirical behavior on functional neuroimaging data, functional 
MRI and magnetoencephalography (MEG) source estimates, defined on 
voxel grids and triangulations of the folded cortical surface. 


1 Introduction 

Computing the average of some observations may seem like a trivial problem, yet 
it remains an active topic of research in mathematics, statistics and applications 
such as medical imaging. The problem of atlas computation from images m , or 
meshes m, or the problem of group analysis from functional imaging data [25] 
are particularly relevant for this field. The challenge is that natural phenomena 
are usually described in terms of physical and temporal event locations, along 
with their intensity. While Euclidean averaging is standard and has some benefits 
such as low computation time, this procedure ignores the geometry of the space 
the observations belong to; the image of an average brain image obtained by 
Euclidean averaging of individual voxels does not yield the image of the brain 
of an average individual. 

Starting from observations defined on a regular or irregular grid, our aim is to 
provide a model-free approach to average them that only builds upon geometric 
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arguments. An example of such data are functional MRI (fMRI) data defined 
on a voxel grid or a triangulated cortical surface. The approach aims to be 
intrinsically geometric in the sense that it only requires the prior knowledge of a 
metric between the locations on the grid. The technique aims to be versatile in 
the sense that it can be applied to weighted samples taking values on a discretized 
space with no assumptions on the regularity of the metric. 

The approach we propose is inspired by optimal transport theory [26] and can 
be seen as an extension of the Wasserstein barycenter problem HEIIT], which 
aims at estimating a probability measure which bests approximates a family of 
probability measures in the Wasserstein metric sense. The challenge of using 
optimal transport in the setting we consider comes from the fact that Wasser¬ 
stein distances (and their barycenters) are defined for probability measures only. 
While some data are normalized such the Orientation Diffusion Function (ODF) 
in diffusion MRI [T0|, a number of medical imaging data are non-normalized. 
Here we bypass this limitation using a generalization of optimal transport dis¬ 
tances proposed by m- This extension comes at a price, since it introduces an 
additional parameter (the cost of adding or removing mass) which can be diffi¬ 
cult to tune. We propose a simple way to mitigate this problem by introducing 
a natural constraint on the overall mass of the barycenter, which we set to be 
equal to the average of the masses of all samples. We provide an efficient method 
to compute Kantorovich means by building upon the first algorithm of [7]. We 
provide intuitions on the behavior of our method and demonstrate its relevance 
with simulations and experimental results obtained with fMRI and MEG data 
which are neuroimaging data defined on a voxel grid or a triangulation of the 
folded cortical mantle. 

This paper is organized as follows. We start in Section with a reminder on 
optimal transport and the Kantorovich metric for non-normalized measures. We 
introduce Kantorovich means in Section [^ and describe efficient algorithms to 
compute them. Section [^ contains simulations and results on publicly available 
fMRI data with 20 subjects and MEG data with 16 subjects. 


2 Optimal Transport Between (Un)Normalized Measures 

We introduce in this section Wasserstein distances for non-negative measures on 
a finite metric space (i?, D). Simply put, we consider normalized histograms on a 
grid of locations, and assume the distance between any two locations is provided. 
Next, we extend Wasserstein distances to non-negative vectors of bounded mass. 

Notations. Let d be the cardinal of 17. Relabeling arbitrarily all the elements 
of 17 as {1, • • • , d} we represent the set of non-negative measures on 17 by the 
non-negative orthant Let be the d-dimensional vector of ones. Eor any 

vector rt G we write \u\i for the h norm of u, When u G we also 

call \u\i the (total) mass of vector u. Let M = [mij] G be the matrix that 

describes the metric between all locations in 17, namely mij = D{i^j)^i^j < d. 
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Wasserstein Distance for Normalized Histograms. Consider two vectors a, 6 G 
such that \a\i = \b\i. Both can be interpreted as histograms on i? of the 
same mass. A non-trivial example of such normalized data in medical imaging 
is the discretized ODF used for diffusion imaging data m- For p > 1, the p- 
Wasserstein distance Wp{a^b) between a and b is the root of the optimum 
of a linear program, known as a transportation problem H §7.2], A transport 
problem is a network flow problem on a bipartite graph with cost (the 
pairwise distance matrix M raised element-wise to the power p), and feasible 
set of ffows U{a,b) (known as the transportation polytope of a and 6 ), where 
U{a^b) is the set of d x d nonnegative matrices such that their row and column 
marginals are equal to a and b respectively: 

Uia, b) {T e I Tld = a, = 6}. (1) 

Given the constraints induced by a and 6 , one naturally has that U{a, b) is empty 
when |a|i 7 ^ | 6 |i and non-empty when |a|i = | 6 |i (in which case one can easily 
check that the matrix a 6 ^/|a|i belongs to that set). The p-Wasserstein distance 
kFp(a, b) raised to the power p (written Wf{a^ b) below) is equal to the optimum 
of a parametric Optimal Transport (OT) problem on df variables, 

WP(a, b) = OT(a, b, M^) min (T, ), (2) 

parameterized by the marginals a, b and matrix M^. 

Optimal Transport for Unnormalized Measures. If the total masses of a and b 
differ, namely |a|i 7 ^ | 6 |i, the definition provided above is not useful because 
U{a,b) = 0. Several extensions of the OT problem have been proposed in that 
setting; we recall them here for the sake of completeness. In the computer vi¬ 
sion literature, [23] proposed to handle that case by: (i) relaxing the equality 
constraints of U{a,b) to inequality constraints Tl^ < a, < 6 in Equa¬ 

tion 0 ; (a) adding an equality constraint on the total mass of the solution 
I'^Tld = min(|a|i, | 6 |i); (Hi) dividing the minimum of (T^M) under constraints 
(i,ii) by min(|a|i, | 6 |i). This modification does not, however, result in a metric. 
[2Qj proposed later a variant of this approach called EMD-hat that incorporates 
constraints (i,n) but (iii^) adds to the optimal cost {T'^^M) a constant times 
min(|a|i, \b\i). When that constant is large enough M, [20] claim that EMD-hat 
is a metric. We also note that [ 2 ] proposed a quadratic penalty between the dif¬ 
ferences of masses and made use of a dynamic formulation of the transportation 
problem. 

Kantorovich Norms for Signed Measures. We propose to build on early contribu¬ 
tions by Kantorovich to define a generalization of optimal transport distance for 
unnormalized measures, making optimal transport applicable to a wider class of 
problems, such as the averaging of functional imaging data. [18] proposed such 
a generalization as an intermediary result of a more general definition, the Kan¬ 
torovich norm for signed measures on a compact metric space, which was itself 
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extended to separable metric spaces by m- We summarize this idea here by 
simplifying it to the case of interest in this paper where i? is a finite (of size d) 
probability space, in which case signed measures are equivalent to vectors in 
[18] propose first a norm for vectors z in the orthogonal of (vectors 2 : such 
that z'^ld = 0), by considering the 1-Wasserstein distance between the positive 
and negative parts of 2 ;, \\z\\k = Wi( 2 ;+, 2 ;_). A penalty vector A G is then 
introduced to define the norm \\x\\k of any vector x as the minimal value of 
II 2 ; II X + A^\z — x\ when 2 ; is taken in the space of all vectors 2 ; with zero sum, 
and \z — x\ is the element-wise absolute value of the difference of vectors 2 : and 
X. For this to define a true norm in A must be such that Ai > maxj niij 
and \Ai — Aj\ < mij. The distance between two arbitrary non-negative vectors 
a, 6 of different mass is then defined as ||a — 6 ||x. As highlighted by [26[ p.l08], 
and if we write for the vector of the canonical basis of this norm is the 
maximal norm in such that for any i, j < d, ||e^ — namely the 

maximal norm in the space of signed measures on i? such that the norm between 
two Dirac measures coincides with i7’s metric between these points. 


Kantorovich Distances for Unnormalized Nonnegative Measures. [T4| noticed 
that Kantorovich’s distance between unnormalized measures can be cast as a 
regular optimal transport problem. Indeed, one simply needs to: (i) add a virtual 
point uj to the set Q = {!,••• ,d} whose distance D{i^uj) = D{uj^i) to any 
element i in i? is set to ; (ii) use that point cj as a buffer when comparing 
two measures of different mass. The appeal of Kantorovich’s formulation in the 
context of this work is that it boils down to a classic optimal transport problem, 
which can be approximated efficiently using the smoothing approach of |6| as 
discussed in Section To simplify our analysis in the next section, we only 
consider non-negative vectors (histograms) a G such that their total mass 
is upper bounded by a known positive constant. This assumption alleviates the 
definition of our distance below, since it does not require to treat separately the 
cases where either |a|i > |6|i or |a|i < |6|i when comparing a, 6 G Note also 
that this assumption always holds when dealing with finite collections of data. 
Without loss of generality, this is equivalent to considering vectors a in such 
that |a|i < 1 with a simple rescaling of all vectors by that constant. We define 
next the Kantorovich metric on 5 '^, where Sd = {u ^ \u\i < 1 }. 

Definition 1 (Kantorovich Distances on Sd)» Let A G such that Ai > 
maxj iTiij and \Ai — Aj\ < rriij. Let p > 0. For two elements a and b of Sd, we 
write 0 = 1 — |a|i>0 and /S = 1 — \b\i > 0. Their p-Kantorovich distance raised 
to the power p is 


KP^{a,b) = OT{[-],[^p],MP), where M 


' M A 
A'P' 0 


1 X d -\-1 


(3) 


The Kantorovich distance inherits all metric properties of Wasserstein distances: 
the mapping which to a vector a associates a vector [a; 1 — |a|i] G A’^+i can be 
regarded as a feature map, to which the standard Wasserstein distance using M 
(which is itself a metric matrix) is applied. 
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3 Kantorovich Mean of Unnormalized Measures 


Consider now a collection {6^, • • • ,6^} of N non-negative measures on {Q^D) 
with mass upper-bounded by 1, namely N vectors in Sd- Let = 1 — |6-^ | be 
the deficient mass of ¥. Our goal in this section is to find, given a vector of 
virtual costs A and an exponent p, a vector a in Sd which minimizes the sum of 
its p-Kantorovich distances to all the ¥, 


1 ^ • 1 ^ 
a € argmin— K^^iu, V) = argmin— OT( [ 


ueSd 


i=i 


ueSd 


i=i 


,M^). (PI) 


Because of the equivalence between Kantorovich distances for points in Sd and 
Wasserstein distances in the d + 1 simplex, this problem can be naturally cast 
as a Wasserstein barycenter problem [T] with metric M. Problem (PI) can be 
cast as a linear program with N{d-{- 1)^ variables. For the applications we have 
in mind, where d is of the order or larger than 10^, solving that program is not 
tractable. We discuss next computational approaches to solve it efficiently. 


Smooth Optimal Transport. [22] and [5] have proposed efficient algorithms to 
solve the Wasserstein barycenter problem in low dimensional Euclidean spaces. 
These approaches are not, however, suitable when one considers observations 
on the cortex, for which all pairs shortest path metrics (inferred from a graph 
structure connecting all voxels) are preferred over Euclidean metrics. To solve 
Problem @ we turn instead to a recent series of algorithms proposed in [7], 
[3] and [8] that all exploit the regularized OT approach suggested in [6] . Among 
these recent approaches, we propose to build in this work upon the first algo¬ 
rithm in [7], which can be easily modified to incorporate constraints on a. This 
flexibility will prove useful in the next section. 

The strategy of [7] is to regularize directly the optimal transport problem by 
an entropic penalty, whose weight is parameterized by a parameter A > 0, 

OTx(a,b,MP)=^ min (T, ) - Ih(T), 

TeU(a,b) A 

where H (T) stands for the entropy of the matrix T seen as an element of the sim¬ 
plex of size d^, H(T) tij \og{tij). As shown by [7], the regularized trans¬ 

port problem OT^ admits a unique optimal solution. As such, OTx{a^b^ M^) 
is a differentiable function of a whose gradient can be recovered through the 
solution of the corresponding smoothed dual optimal transport. Without elabo¬ 
rating further on this approach, we propose to simply replace all expressions that 
involve an optimal transport problem OT in our formulations by their smoothed 
counterpart OT^. 


Sensitivity of Kantorovieh Means to the Parameter A. The magnitude of the 
solution a to Problem 0 depends directly on the virtual distance A. Suppose, 
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for instance, that A = eld with 5 arbitrarily small. In that case a should con¬ 
verge to a unit mass on the last (virtual) bin and would therefore be equal to 
the null histogram 0^^ on the d other bins. If, on the contrary, Z\ = 7 !^ and 7 is 
large, we obtain that Kp^{a,b) grows as | \a\i — \b\i\. Therefore a minimum 
of Problem ( |P1[ ) would necessarily need to have a total mass that minimizes 

I |<^|i — I 1 1 1, namely a total mass equal to the median mass of all ¥. This 
sensitivity of the solution a to the magnitude of A may be difficult to control. 
Choosing adequate values for Z\, namely setting the distance of the virtual point 
to the d other points, may also be a difficult parameter choice. To address this 
issue we propose to simplify our framework by introducing an equality constraint 
on the mass of the barycenter a in our definition, and let A be any non-negative 
vector, typically set to a large quantile of the distribution of all pairwise distances 
Mfj times the vector of ones Id. Under these assumptions, we can now propose 
p-Kantorovich means with a constraint on the total mass of a. Remaining pa¬ 
rameters in our approach are therefore only p and A. In practice we will fix p = 1 , 
which corresponds to the Earth Mover’s Distance m, and use a high A, namely 
a small entropic regularization of order 1 /A, which has also the merit of making 
Problem strongly convex. A is set in our experiments to 100 /median(M), 
where median(M) is the median of all pairwise distances 

Definition 2 (p-Kantorovich Means with Constrained Mass). Let A G 

and p > 0. A Kantorovich mean with a target mass p < 1 of a set of N 
histograms {6^, • • • ,6^} in Sd is the unique vector a in Sd such that: 

a G argmin^ OTA(a, ^, M^). 
aeSd ^ ^ 

\a\i=p ^ 

We provide in Algorithmj^an implementation of [71 Alg.l]. Unlike their version, 
we only consider a fixed step-length exponentiated gradient descent, and add a 
mass renormalization step. We set the default mass of the barycenter to be the 
mean of the masses of all histograms. We use the notation o for the elementwise 
(Schur) product of vectors. Note that the computations of N dual optima in line 
7 of Algorithmic below can be vectorized and computed using only matrix-matrix 
products. We use GPGPUs to carry out these computations. 

4 Application to the Averaging of Neuroimaging Data 

Neuromaging data are defined on a grid of voxels, eventually restricted to the 
brain volume, or on a triangulation of the cortical mantle obtained by segmenta¬ 
tion of MRI data. Examples of data most commonly analyzed on a grid are fMRI 
data, while neural activity estimates derived from MEG/EEG data are often re¬ 
stricted to the cortical surface [9]. Anatomical data such as cortical thickness, 
which is a biomarker of certain neurodegenerative pathologies, is also defined on 
the surface. In all cases the data are defined on a discrete set of points (voxels or 
vertices) which have a natural distance given by the geometry of the brain. The 
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Algorithm 1 p-Kantorovich Barycenter with Constrained Mass 


1 : 

2 : 

3: 

4: 

5: 

6 : 

7: 

8 : 

9: 

10 : 

11 : 


Input: {b^, • • • , b^} C metric M, quantile p > 0, entropic regularizer A > 0, 
step size c. 

Compute mean mass p = ^ 

Form virtual cost vector A = quantile(M, q%)ld. 

Form augmented d + 1 x d + 1 ground metric M as in Equation § 

Set a = ld+i/(d + 1). 
while a changes do 

Compute all dual optima of OTa(ci, , M) using [71 Alg.3] 
a ^ a o exp(—(gi'cidient update) 


ai 


1 - P 


if z < d, 
if z = d + 1. 


(projection on the simplex/mass constraint) 


end while 


Output ai:d G Sd- 


data are also non-normalized as they represent physical or statistical quantities, 
such as thickness in millimeters or F statistics. Such data are therefore particu¬ 
larly well adapted to the algorithm proposed in this paper: they are defined on 
discrete space, are non-normalized and there exists a natural ground metric. 

The difficulty when averaging neuroimaging data is the anatomo-functional 
variability: every brain is different. The standard approach to compensate for 
this variability across subjects is to smooth the data to favor the overlap of 
signal of interest after the individual data have been ported to a common space 
using anatomical landmarks (anatomical registration). Volume data are typically 
redefined in MNI space while surface data are transferred to an average cortical 
surface, using for instance the FreeSurfer softwar^ When working with fMRI 
the spatial variability is commonly compensated by smoothing the data with an 
isotropic Gaussian kernel of Full Width at Half Maximum (FWHM) between 
6 and 8 mm. MEG and EEG suffer from the same spatial variability, but also 
from the temporal variability of neural responses which is compensated by low 
pass filtering the data. By employing a transportation metric informed by the 
geometry of the domain, this smoothing procedure as well as the setting of the 
kernel bandwidth are not needed. 

In the following experiments we focus on spatial averaging, although exten¬ 
sion to spatiotemporal data is straightforward provided the metric is defined 
along the time axis. When working with a voxel grid the distance is the Euclid¬ 
ian distance taking into account the voxel size in millimeters, and when working 
on a cortical triangulation, the distance used is the geodesic distance computed 
on the folded cortical mantle. We now present results of a simulation study 
where standard averaging with Gaussian smoothing is compared to Kantorovich 
means. Simulation results are followed by experimental results obtained with 
fMRI data from 20 subjects and MEG data on a population of 16 subjects. 


^ https://surf er.nmr.mgh.harvard.edu/ 
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Fig. 1. Simulation results with focal random signals generated in areas/labels BA45 
(yellow) and MT (red) in a group of 100 subjects. Data are defined on a surface with 
10,024 vertices (FreeSurfer fsaverage 5). One shows the standard averaging referred 
to as Mean, the averaging after Gaussian smoothing is referred to as Mean (S) (mean 
after Gaussian smoothing with FWHM=8mm), and the Kantorovich mean (p=l). The 
result Mean shows the focal signals with random positions in the labels delineated in 
green. The Kantorovich mean highlights clear foci of activations in the ROIs without 
smearing the activation as with Gaussian smoothing which furthermore significantly 
dampens the amplitudes. 


Simulation setup. In this experiment using on a triangulation of the cortex, we 
simulated signals of interest in two brain regions using the functional parcella- 
tion provided by the FreeSurfer software. We used regions Broadman area 45 
(BA45) and the visual area MT. We simulated for a group of 100 subjects ran¬ 
dom positive signals in these two regions. For each subject and each region, the 
signal is focal at a random location with a random amplitude generated with a 
truncated Gaussian distribution (mean 5, std. dev. 1.). We use here focal signals 
to exemplify the effect of optimal transport. Such signals could correspond to 
dipolar activations derived from MEG/EEG using dipole fitting methods [24] or 
sparse regression techniques [27113]. 

Eigure presents the locations of the two regions (labels), the averages 
with and without Gaussian smoothing and the Kantorovich average. Gaussian 
smoothing leads to a highly blurred average which exceeds the extent of the 
regions of interest, while it also strongly reduces the amplitudes of the signals, 
potentially washing out the statistical effects. The peak amplitudes obtained 
with optimal transport are also higher and closer to the individual peak am¬ 
plitudes. One can clearly observe the limitations of Gaussian smoothing, which 
furthermore requires to set the bandwidth of the kernel. The Kantorovich aver¬ 
age nicely highlights two foci of signals at the group level. 


Results on fMRI data. We used here fMRI data analyzed on a voxel grid. It 
corresponds to 20 subjects from the database described in [21]. We average here 
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r t?. ■' 


y=-24 



Fig. 2. Averaging of the standardized effect of interest on fMRI data. From left to right, 
a) the Euclidian mean without smoothing, b) the Euclidian mean with smoothing 
(EWHM=8mm), c) the Kantorovich mean with Euclidian ground metric and p=l. 
The later result highlights a clear foci of activations in the ROI without smearing the 
activation nor damping the amplitudes as much as the kernel smoothing. 


the standardized effect of interest induced by left hand button press. In Fig. [^a 
we show the Euclidian average without smoothing. In Figj^b we report results 
obtained by classical averaging following Gaussian smoothing with FWHM of 
8 mm. Figj^c shows the Kantorovich mean with constrained mass. One can 
observe that this barycenter highlights a clear active region without requiring 
any kernel smoothing. It also leads to a amplitude in the average standardized 
effect around 1.7 which is much higher than the 0.23 obtained when smoothing. 

Results on MEG data. We now evaluate the benefit of the proposed approach on 
experimental data. These data were acquired with a Neuromag Vector View sys¬ 
tem (Elekta Oy, Helsinki, Finland) with 306 sensors arranged in 102 triplets, each 
comprising two orthogonal planar gradiometers and one magnetometer. Subjects 
are presented with images containing faces of familiar (famous) or unfamiliar per¬ 
sons and so called “scrambled” faces. See [16] for more details. Dataset contains 
16 subjects. For each one, event related fields (ERF) were obtained by averaging 
about 200 repetitions of recordings following stimuli presentations. Data were 
band pass filtered between 1 and 40 Hz. Following standard MEG source local¬ 
ization pipelines [12], a noise covariance was estimated from prestimulus time 
intervals and used for source reconstruction with the cortically constrained dSPM 
method [9]. The values obtained with dSPM can be considered as F statistics, 
where high values are located in active regions. 

In Figure we present results at a single time point, 190 ms after stimulus 
onset, which corresponds to the time instant where the dSPM amplitudes are 
maximum. Data correspond the visual presentation of famous faces. In green, is 
the border of the primary visual cortex (VI) provided by the FreeSurfer func¬ 
tional atlas. One can observe that the Kantorovich barycenter yields a more 
focal average nicely positioned in the middle of the calcarine fissure where VI 
is located. Such a strong activation in VI is expected in such an experiment 
consisting of visual stimuli. To investigate more subtle cognitive effects, such as 
the response of the fusiform face area (FFA) reported about 170 ms after stimu¬ 
lation in the literature we report results obtained on contrasts of ERFs 

measured after famous faces presentations vs. scrambled faces. As illustrated in 
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Figure Kantorovich mean nicely delineates a focal source of activity in the 
ventral part of the cortex known as the fusiform gyrus. 

These results show that Kantorovich means provides focal activation at 
the population level despite the challenging problem of inter-subject anatomo- 
functional variability. They avoid the smearing of the signal or statistical effects 
of interests which naturally occur when data are spatially smoothed before stan¬ 
dard averaging. Note again that here no smoothing parameter with FWHM 
in millimeters is manually specified. Their solution only depends on the met¬ 
ric naturally derived from the geometry of the cortical surface. With a cortical 
triangulation containing 10,024 vertices and 16 subjects the computation on a 
Tesla K40 GPU of one barycenter takes less than 1 min. 



Fig. 3. Average of dSPM estimates derived from MEG ERF data on a group of 16 
subjects stimulated with pictures of famous faces. From left to right: standard mean 
and Kantorovich mean. The left hemisphere is displayed in medial view. In green is 
the border of the primary visual cortex (VI) provided by FreeSurfer. One can observe 
that the Kantorovich mean has its peak amplitude within VI. 



Fig. 4. Group averages (16 subjects) of dSPM estimates derived from MEG ERE 
data obtained by contrasting the famous faces stimulation with the scrambled faces. 
Erom left to right: standard mean and the Kantorovich mean. The right hemisphere 
is displayed in ventral view. Optimal transport results highlight a focal activity in the 
Eusiform gyrus known to be implicated in face processing m- 


5 Conclusion 

The contributions of this work are two-fold. First, by considering non-normalized 
measures particularly relevant for medical imaging data we extend the current 
state of the art in barycenter estimation using transportation metrics. Follow¬ 
ing recent contributions on discrete optimal transport we propose a smoothed 
version of the transport problem that leads us to an efficient optimization algo- 
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rithm. While many contributions on optimal transport work only in one or two 
dimensions on a regular grid, our approach can cope with the complex geome¬ 
try of the brain (irregular grids and surfaces). Only the definition of a ground 
metric is here required. The algorithm proposed involves simple operations that 
are particularly adapted to modern GPU hardware and allows us to compute 
barycenters on full brain data in a few minutes. 

Second, with simulations defined on the cortex triangulation, a publicly avail¬ 
able fMRI dataset with 20 subjects and an MEG dataset processed with a 
standard analysis pipeline with 16 subjects we demonstrated the ability of the 
method to clearly highlight activation foci while avoiding the need to smooth 
the data. The fMRI data showed a clear activation in the right motor cortex and 
on the MEG data we showed that the proposed approach better identified acti¬ 
vation foci in the primary visual cortex and the fusiform gyrus. Both findings, 
that are consistent with previous neuroscience literature, show that method pro¬ 
posed yields more accurate results than the current pipelines which furthermore 
requires to set a kernel bandwidth parameter. The removal of any free parameter 
in the pipeline is a way towards more reproducible neuroimaging results. 

Due to the non-linearity of the approach the estimation of statistical thresh¬ 
old shall be performed with non-parametric permutation tests. When threshold¬ 
ing barycenters as presented in Section it is expected that one will obtain clear 
clusters. 
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